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ABSTRACT 

Censoring,  truncation  and  grouping  represent  different  but  related  forms 
of  incompleteness.  Methods  of  producing  kernel  functions  on  the  incomplete 
observations  are  proposed.  They  involve  substituting  for  or  averaging  over 
the  incomplete  observations.  Consistency  of  the  procedures  in  terms  of  the 
criterion  of  integrated  mean  squared  error  is  established  and  optimal  choice 
of  smoothing  parameter  is  achieved. 
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SIGNIFICANCE  AND  EXPLANATION 


When  data  are  used  to  estimate  a  probability  density  function,  either  a 
special  parametric  form  is  assumed  for  the  latter,  a  Normal  density  being  a 
common  particular  case,  or  a  nonparametric  method  is  employed.  One  such 
example  is  the  Kernel  method. 

For  many  problems  data  are  available  which  are  incomplete  in  some 
sense.  Three  types  of  incompleteness  are  censoring  (in  which  the  exact  values 
of  some  observations  are  unknown)  truncation  (in  which  the  data  are  known 
exactly  and  also  to  be  restricted  to  a  certain  range)  and  grouping,  of  which 
one  manifestation  is  data  in  the  form  of  a  histogram. 

The  basic  kernel  method  relies  on  the  data  being  "complete**  and  this 
paper  gives  adaptations  to  cope  with  the  above  three  types  of  incompleteness. 
One  feature  of  density  estimation  by  the  kernel  method  is  the  need  to  choose, 
in  some  sensible  or,  if  possible,  optimal  way,  a  parameter  which  dictates  the 
smoothness  of  the  resulting  estimate.  A  formula  is  derived  for  the  value  of 
the  smoothing  parameter  which  is  optimal  according  to  one  particular 
criterion. 

Techniques  for  coping  with  incomplete  data  within  parametric  models  are 
well  established.  It  is  important  to  deal  with  such  problems  with 
nonparametric  methods  as  well  because,  although  no  parametric  model  may  be 
correct  for  a  given  application,  the  converse  is  true  for  nonparametric 
methods,  at  least  asymptotically. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report* 


KERNEL-BASED  DENSITY  ESTIMATION  USING  CENSORED, 

TRUNCATED  OR  GROUPED  DATA 

* 

D.  M.  Titterington 
1 .  INTRODUCTION 

The  problem  of  density  estimation  using  censored,  truncated  or  grouped 
data  is  an  important  one*  When  a  parametric  model  is  acceptable,  the  problem 
becomes  one  of  parameter  estimation  and  the  maximum  likelihood  approach  is 
dealt  with  succinctly  by  Dempster  et  al  (1977,  Section  4.2)*  A  maximum 
likelihood  approach  to  the  nonparametric  version  of  the  problem  is  dealt  with 
by  Turnbull  (1974,  1976)*  This  does  not,  however,  lead  to  a  smooth  estimate 
of  the  underlying  probability  density  function*  The  object  of  this  paper  is 
to  propose  methods  for  achieving  this  aim  based  on  the  kernel  approach  and  to 
investigate  some  asymptotic  properties* 

One  condition  has  to  be  imposed,  however,  namely  that  some  information 
about  the  overall  density  be  available*  In  the  parametric  case  this  is 
supplied  by  the  parametric  family  chosen.  In  the  absence  of  this  we  shall 
require  that  a  set  of  nQ  observations  be  available  which  are  quite 
unaffected  by  the  censoring,  truncation  or  grouping  mechanism.  (If,  for 
instance,  only  grouped  data  are  available,  in  the  form  of  a  histogram  with 
fixed  bin  size,  then  there  is  no  hope  of  consistently  estimating  the  density 
everywhere  without  further  information.)  The  incomplete  data,  therefore,  may 
be  regarded  as  supplementary  to  the  original  nQ  observations* 


* 
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The  methodology  will  be  similar  to  that  of  Titterington  and  Mill  (1981) 


who  dealt  with  multivariate  data  with  missing  values.  In  what  follows ,  only 
univariate  continuous  data  are  considered* 


2.  THE  DATA  AVAIIABLE 


2.1  Censoring 

Along  with  nn  independent  observations  x  -  (x4,...,x  )  from  the 

u  —  i  n 

o 

underlying  distribution  on  a  sample  space ,  X,  with  probability  density 
function  f(#),  we  have  y  *  (yj#»*«#yn  )'  n*t  independent  observations,  with 
known  values,  in  A,  a  subset  of  X,  and  n2  independent  observations  known 
to  be  in  A,  the  complement  of  A  in  X. 

We  assume  that,  given  n^  +  n2,  n^  ^  Bi(n^  +  n2,  P(A)),  where 

P(A)  -  JA  £(x)dx  , 
and  that  n0  •  90(n0  +  ni  +  n2} " 

(The  asymptotic  results  we  obtain  would  hold  also  under  the  assumption 
that  given  *o  +  n1  +  n2  “  n#  nQ  ~  Bi(n,  0Q).) 

2.2  Truncation 

Along  with  x  we  have  y  ■  (y^  #  ♦  •  •  #yn^ )  #  n^  independent  observations 
from  A,  a  subset  of  X.  The  p.d.f.  for  each  of  the  {y^}  is  therefore 

f (y)/P(A)  (y  e  A)  . 


2.3  Grouping 

Along  with  x  we  have  independent  samples  of  sizes  n1,...,ntt, 
containing  independent  observations  from  members  Aj,...,Am,  respectively,  of 
a  partition  of  X.  Given  n^  +...♦  r^,  the  n^*s  are  multinomial,  with  cell 
probabilities  {P(A^)},  where 


3.  KERNEL-BASED  DENSITY  ESTIMATION  WITH 
INCOMPLETE  DATA 

Given  a  data-set  t^  *  of  n  independent  identically 

distributed  observations,  each  with  p.d.f.  f(x),  x  e  X,  a  kernel-based 
density  estimate  of  f(x)  takes  the  form 

a  -i 

f ( x )  -  (nh)  l  K( (x-t . )/h)  , 

i-1  1 

where  h  is  a  smoothing  parauneter  and  the  kernel  function,  K(*),  is  itself 
a  density,  usually  with  its  mode  at  zero.  We  shall  assume  that  K( • )  is 
square-integrable  and  symmetric,  with  bounded  first  and  second  absolute 
moments.  Define 

1^  ■  /  u2K(u)du 

and 

I2  *  /  K2(u)du  . 

One  interpretation  of  our  basic  question  is  to  ask  what  to  use  for  the 
kernel  function  on  an  incomplete  observation.  In  the  spirit  of  Titterington 
and  Mill  (1981)  we  propose  two  possible  solutions. 

(A)  Plug  in  a  •’complete**  data  point  for  the  incomplete  one. 

(B)  Average  out  the  "incompleteness" • 

In  the  case  of  censored  data,  for  instance,  we  require  kernel  functions 
on  the  n2  censored  observations  in  A.  The  corresponding  p.d.f.  is 

f ( z )/P (A )  (z  e  A)  , 

where  P(A)  =  1  -  P(A). 

Although  this  density  is  unknown  we  do  have,  from  x_,  an  estimate 

fQ(z)/P(A),  (z  e  A)  ,  (  1) 

where 

A  n° 

fQ<z>  *  (n  h)  1  l  K( ( z-x  )/h ) 

1*1  1 

and 
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P(A)  -  /_  fQ(z)dz  . 

A 

As  justified  in  Titterington  and  Mill  (1981)  we  dismiss,  for  (A)/  the 
"deterministic"  mean- imputation  procedure  of  plugging  in  the  expected  value 
from  (1)  for  each  censored  observation.  Instead  we  use  simulated  values  from 
(1).  We  may  use  one  value  or,  more  generally,  r  independent  values, 

♦ ,zir,  for  the  ith  censored  observation,  giving  the  "kernel" 

-1  f 

(rh)  l  K((x-z  )/h)  . 

j*1  .  3 

It  is  natural  that  the  averaging  in  method  (B)  be  carried  out  using  (1), 
giving  the  following  "kernel"  on  each  censored  observation. 

{hP(A>}_1  /_  K((x-z)/h)f0(z)dz  .  (2) 

A 

In  practice  this  integral  may  well  have  to  be  evaluated  numerically,  in 
contrast  to  what  is  possible  in  the  missing-values  problem  (Titterington  and 
Mill,  1981).  Direct  simulation  from  (T)  will  also  be  awkward  but  here  the 

A 

problem  is  eased  in  practice  if  we  simulate  from  the  density  fQ(z) 

(z  e  X),  which  is  a  mixture  density  that  should  be  easy  for  simulation,  and 
ignore  all  values  not  in  A. 

Por  truncated  data,  the  "incomplete"  observations  are  not  so  immediately 
apparent.  We  introduce  them  deviously,  as  in  Dempster  et  al  (1977),  by 
proposing  that,  corresponding  to  the  n^  truncated  observations,  there 

lurk  n2  observations  in  A  to  make  up  n^  +  n2  altogether  in  X.  Given 
n^,  n2  has  a  negative  binomial  distribution  on  (0,1,2,...),  with 

l(n  In  )  -  n  PUJ.PtA)"1  . 

For  each  of  these  n2  we  generate  kernels,  as  above,  by  simulating  or 
averaging.  Joint  simulation  of  n2  and  the  corresponding  {z^}  is  neatly 
achieved  by  simulating  from  the  p.d.f.  fg(z)  (*  €  X)  until  rn.|  values  in 


A  have  been  generated  and  by  regarding  the  remainder  as  the  rn2  extra 
values  in  A. 

It  should  now  be  clear  how  to  deal  with  grouped  data,  so  that  we  may  list 
the  following  proposals  for  density  estimates. 

3.1  Censoring 


(A) 


0  1 
f A (x )  “  <n0+n1+n2)"lh"1  (  l  K((x-xi)/h)  +  l  K((x-yi)/h) 


i-1 


i*1 


+  r"1  l  K((x-z  )/h>] 
i«1  1 


(3) 


where  (z-,...,z^  )  denote  the  simulated  values,  a  notation  which  fits  in 
l  m2 

better  with  the  truncation  case. 


(B)  f  <x)  -  (n.+n.+n.rV1  /  l  K((x-x. )/h)  +  l  K((x-y.  )/h) 

B  0  12  \  1  j**|  X 


+  n  P(A)  1  J_  K((x-z)/h)f  (z)dz\  .  (4) 


3.2  Truncation 

Formulae  (3)  and  (4)  are  again  relevant.  It  must  be  remembered  that, 
given  n^,  n2  is  the  realization  of  a  negative  binomial  random  variable,  as 
discussed  above. 

3.3  Grouped  data 


(A) 


A  »  -  -  I  0 

fA(x)  »  (nQ  +  l  n^)  h  <  l  KUx-x^/h) 


n 

V 

k=1 

m  %  r 


i-1 


+  r  1  l  l  |  K(  (x-zfV  )/h)) 
k-1  i*1  j-1  3 


a  m  I  0 

fB(X>  "  (n0  +  ^  <  l  KUx-x^/h) 


k»1 


i-1 


+ 


v(v 


■v 


.  K( ( x-z ) /h ) f  ( z ) dz 
Ak  0 


(5) 
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In  (5),  the 


(k) 

}  are  independent,  with  p<d«f<'s 
?0(r)/P(Ak)  (z  e  Ak),  for  each  i,j,k 


t 


with 

P(Ak)  -  . 


4.  ASYMPTOTIC  RESULTS 


In  this  section  we  establish  consistency  of  the  density  estimators  under 
the  criterion  of  integrated  mean  squared  error  and  derive  optimal  values  for 
the  smoothing  parameter,  h.  Specifically,  we  show  that,  for  suitably 
defined  n, 

/  B{f(x)-f(x)}2dx  -  ^  H2h4  +  Gn-1h_1  +  oO^+iTV1)  ,  (7) 

for  certain  constants  H  and  G*  The  dominant  terms  in  (7)  are  minimized  by 

h#  *  (GH’V1)1/5  (8) 

and,  under  this  choice,  the  right  hand  side  of  (7),  of  order  0(n~4y^),  tends 
to  zero  as  n  ♦  «.  The  calculations  involved  are  similar  to  those  of 
Rosenblatt  (1956)  and  Epanechnikov  (1969). 


Mote  that 

/  E(f(x)  -  f(x))2dx  -  /  (Ef(x)  -  f(x))2dx  ♦  /  var  f(x)dx  •  (9) 

The  dominant  terms  in  (7)  come  from  these  two  constituent  parts,  which  we 
evaluate  below.  For  all  three  types  of  incompleteness  we  may  observe  that, 
conditioning  on  jc ,  ^  (in  the  cases  of  censoring  and  truncation)  and  all 
sample  sizes  ji,  averaging  over  the  simulated  data,  z^,  gives 

■  f  ( x )  -  f«(x)  ,  for  all  x  . 

Thus ,  uncondi t iona 1 ly , 

«fA(x)  -  *£b(x) 


and 


A 


var  f^lx) 


E 

5'X'2 


A  A 

var  f  (x)  +  var  f  (x) 
A  B 

*  x,£*n 
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(10) 


A  A 

Thus,  almost  certainly,  f  {•)  will  not  be  as  efficient  as  f  (•), 

A  B 

although,  as  we  shall  see,  its  comparative  ease  of  application  may  make  it  the 
preferred  method  in  practice. 

In  the  Appendix,  the  case  of  censored  data  is  dealt  with  in  detail,  with 
the  results  that  H  is  the  same  for  both  method  (A)  and  method  (B),  whereas 
the  values  of  G  are  different.  GA  and  Gfi  are  given  by  equations  (A. 8) 
and  (A.7 ) • 

Exactly  the  same  results  will  hold  for  the  truncated-data  case  except 
that  the  value  n  in  (7)  has  to  be  interpreted  differently.  In  practice 
N  *  Hq  +  n^  will  be  known  and  n  is  to  be  interpreted  as  the  total  sample 
size,  given  n1  and  Uq.  Thus,  formulae  in  terms  of  N  can  be  obtained  by 


substituting  from 


n  -  N{60  +  (1-eo)/P(A)} 


in  the  results  for  censored  data. 

Calculations  for  the  case  of  grouped  data  give 

EV*)  “  EfB(x)  -  f(x)  +  bh(x) 

+  <1-0Q){bh(x)  -  P(A(x))"1Bh(A(x))f(x)}  +  o(h2)  , 

where  A(x)  is  the  grouping  interval  containing  x. 

In  the  third  term  we  have  followed  the  approximation  leading  to  equations 


( A. 4 ) •  Also, 


/  var  ?B(x)dx  -  (nh)”1{0()I2  +  2(1-0Q)I4  +  ( l-O^Ij/Og}  +  otn^h"1)  , 


where  I4  and  I^  are  defined  in  the  Appendix. 


/  var  fA(x)dx  -  /  var  ffl(x)dx  +  (nh)"1 ( 1-0Q )I2/r+o(n~  h  1 )  . 


I 


5.  SOME  NUMERICAL  RESULTS 


When  there  is  a  substantial  amount  of  censored  or  truncated  data  to 
supplement  x,  the  density  estimator  which  incorporates  them  should  be  better 
than  that  based  on  x.  We  present  some  numerical  results  for  the  case  of 
censored  data  from  a  standard  Normal  distribution,  using  a  Normal  kernel 
function,  for  which  1^  =  1/  *2=  ^ ,  I^  =*  (^6ir)  *  an d  1^  *  I2* 

With  the  optimal  choice,  h* ,  for  the  smoothing  parameter,  the  dominant 
term  in  (7)  is 


S  * 


(q2h„-2)2/s 


If  only  the  complete  data  are  used,  then  the  corresponding  value  is 

2  -2  2/5 

so  •  (GoVo  > 


where,  effectively,  nQ  »  n0Q,  Gq  =  I2  and 

Hq  “  I*  /_„  {f-(x)}2dx  . 

Of  interest  is  the  ratio 


Since  GQ  ■  I2# 


R  =  (S/S0)V2  =  G2H02/G2Ho  . 


R  -  F2H02/Ho  , 


where 

FB  ”  60  +  (1~V{P(A)  +  P(A,<0o1<1_eo>I3/I2  +  2I4/]C2)} 
and  FA  *  FB  +  (1-®0)P(A)/r  . 

As  an  illustrative  simple  example  take  A  =*  (-*,0).  Then,  from  the 
Appendix , 

2  rm  2 
H  *  H  ( x ) dx  , 


where 


H(x)  * 


(x  e  A) 


-  I1{(2-e0)£"(x)  -  i  f(x)  /q  f"(y)dy}  (x  e  A)  . 


Since  /Q  fM(x) 


0,  we  obtain 

H2  =  H2{1  +  (2-0q)2}/2  , 


so  that 


In  particular, 


F 


B 


F 


A 


R  =  F202/{1  +  (2-0q)2}/2  . 

since  P(A)  = 

+  (1-0Q){1  +  =  1  +  0o1d-0o)2/»/6 

+  ( 1-0Q )/2r  . 


Thus 

Rb  -  {eQ  +  (1-0o)2//6}2[{1  +  (2-0o)2}/2]1/2 

Rft  =  {0Q  +  (1-0q)2//6  +  0Q  ( 1-0Q  )/r>2  [  {  1  +  (2-0q)2}/2]1/2  . 

Values  of  for  various  values  of  r  and  are  displayed  in  Table  1  * 

The  row  for  r  =  00  corresponds  to  can  be  close  to  Rg  for  only  a 

small  value  of  r,  a  phenomenon  reported  also  by  Titterington  and  Mill 
(1981).  Thus,  although  method  (B)  is  in  principle  to  be  preferred,  method  (A) 
can  easily  be  almost  as  good,  as  well  as  being  much  easier  to  apply. 


6.  DISCUSSION 

We  end  with  the  following  comments. 

(i)  Although  the  censoring  and  truncation  requirements  are  very  simple, 
there  is  difficulty  in  extending  the  analysis  to  more  complicated  ones,  based 
on  a  partition  of  X,  on  the  lines  of  Dempster  et  al  (1977,  Section  4.2). 

(ii)  In  practice  some  data-based  method  may  be  required  for  choosing  the 
smoothing  parameter,  h.  The  formula  given  by  (8)  depends  on  the  unknown 
density  itself.  A  useful  reference  is  Scott  and  Factor  (1981). 
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(ill)  It  has  to  be  admitted  that  some  of  the  gains  embodied  in  Table  1 
are  not  remarkable*  However,  the  methods  of  the  paper  should  be  valuable  as 
nonparametric  imputation  procedures  (particularly  method  (A)  with  r  *  1).  In 
many  sample  survey  projects  with  non-response  it  is  desirable  to  impute  the 
missing  values  in  such  a  way  as  to  provide  a  "fair"  complete  data-set*  Given 
that  the  statistical  characteristics  underlying  the  incompleteness  process  are 
as  described  in  the  paper,  method  (A)  will  certainly  achieve  this  aim. 

(iv)  Only  the  case  of  fixed  Type  I  censoring  has  been  considered  here* 
The  same  methods  can  be  applied  to  random  censoring  and  consistency  will 
obtain,  provided  we  have  a  data-set  Dq  which  is  known  not  to  have  been 
subject  to  the  possibility  of  censoring*  In  many  problems  involving  random 
censoring  such  a  Dq  is  not  available  and  to  use  the  uncensored  data  we  have, 
which  would  correspond  to  D^,  for  imputation  or  averaging  would  almost 
certainly  lead  to  bias.  For  this  case  methods  have  been  developed  for 
smoothing  the  nonparametric  Kaplan-Meier  estimate  of  the  survival  curve;  see 
Foldes  and  Ret jo  (1981)  and  Yandell  (1981). 

i 
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APPENDIX.  CALCULATION  OF  INTEGRATED  MEAN  SQUARED 
ERROR  FOR  CENSORED  DATA  CASE. 

/ 

Once  the  first  term  on  the  right  hand  side  of  (10)  is  dealt  with,  the 

A 

remaining  calculations  are  all  related  to  ffi(x)  as  given  by  (4). 

From  ( 3 ) , 


var  f  ( x )  -  (n  +n  +n  )  2n  (rP(A)}  1  {h  2  /  K*((x-z)/h)f  (z)dz  +  o(h  *)} 

A  0122  —  0 


-2 . 


(n0+n1+n2)”2n2trP(A)}"1{h"‘i  /_  (x-z)/h)f (z)dz  +  o(h  Z)  } 

A 


r»  “*2 


If,  given  n^  +  n2,  n2  ^  Bitn^+n^  P(A))  and  if,  given  nQ+n1^n2  ■  n, 

nQ  -  0Qn  (or  nQ  ^  Bi(n,  0^)),  then  the  dominant  term  in  E  var  f^(x),  for 

z 


use  in  (10),  is 


(1-0Q)(nrh2)  1  /_  K2( (x-z)/h)f (z)dz  • 
A 


(A*  1  ) 


(An  unqualified  "E"  or  "var"  will  be  assumed  to  involve  averaging  over 
any  random  variation  not  so  far  acounted  for.) 

This  leads  to  the  following  contribution  to  (9). 

/  8  var  fA<x)dx  -  (l-e^nrh2)-1  J  /_  K2( (x-z)/h)f (z)dzdx  . 

Z  A 

Substitution  of  x  by  z  -  uh  gives 

a-1 


(1-0o)(nrh)  /_  f(z)dz  /  K  (u)du 


(1-0o)(nrh)-1P(A)I2  . 


We  now  concentrate  on  f0(x)  from  (4). 
In  the  notation  of  Silverman  (1978), 


fQ(x)  -  f(x)  +  bh(x)  +  oh(x)  , 


(A. 2 ) 


where 


bh<x)  -  \  h2I1  f"(x)  +  o(h2) 
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and  is  a  zero-mean  Gaussian  process  with  variance  function 


(n0h)"*1l2f (x)  +  ofn^h  *)#  given  nQ.  Then 


P(A)  -  P  (A)  +  B,  (A)  +  S,  (A) 
n  n 


where 


B^(A)  -  /_  bh(x)dx 
A 


and 


S.(A)  -  /_  a  (x)dx  . 


Thus 

1 


{hP(A) }  /_  K((x-z)/h)f  (z)dz 


*  {hP(A)}-1{l  -  (Bh(A)+Sh(A))/P(A)>  /_  K((x-z)/h)(f(z)+bh(z)+o  (z))dz 


PtAj'^h"1  /  K((x-z)/h)f(z)dz  +  8  (x)  +  Yu(x)>  , 
7  n  n 


where 


3h(x)  -  h-1  /_  K( (x-z)/h)bh(z)dz  -  B^A) (hP(A) }-1  /_  K( (x-z)/h)f (z)dz  (A. 3) 


and 


Yh(x)  *  h  /.  K((x-z)/h)ah(z)dz  -  Sh<A) (hP<A) }”1  /_  K( (x-z)/h)f (z)dz  . 
A  A 

When  taking  expectations  over  the  sample  sizes,  the  dominant  term  is 


obtained  simply  by  inserting  expected  values,  n0Q  for  nQ,  n(1-0Q)P(A) 


for  n-j  and  n(1-0Q)P(A)  for  Variances  over  the  sample  sizes  will  be 

of  order  CXn”1),  which  is  ofn^h”1)  and  o(h2),  for  h  of  the  order 
shall  use,  namely  0(n~^^).  These  variances  may  therefore  be  neglected. 


we 


It  follows  that,  if  all  but  the  dominant  terms  are  neglected. 


E  fB(z)  «  iT1  [n0Q{f  (x)+bh(x)}  +  n( 1-0Q)P(A) {hP(A) j"1 


JA  K( (x-y)/h)f (y)dy  +  n( 1-0Q )P(A) {P(A) }-1 {h*1  /_  K( ( x-y )/h)f (y )dy 

A 


+  Bh(x)}] 


3- 


n  1  [n8Q(f  (x)+bh(x) )  +  nd-B^Jh  1  /  K(  (x-y)/h)f  (y)dy 


+  n(1-60)Bh(x)] 


-  f (x)  +  bh(x)  +  (1-80)Bh(x)  +  o(h2)  . 

Note,  from  (A. 3),  that  /  0^(x)dx  *  0.  Also,  for  small  h,  it  is 
approximately  true  that 


Vx)  “ 


0 


(x  e  a) 


-  bh(x)  -  Bh<A)P(A)_1f(x)  (X  e  A) 


(A. 4) 


If  we  define  H(x)  by 

j  h2H(x)  -  b^x)  +  <1-0o)8h(x)  ,  (A. 5) 

then 

/  {l?  (x)  -  f(x)}2dx  -  4  h4H2  +  o(h4)  , 

B  4 

2  2. 

where  H  ■  /  H  (x)dx«  The  approximation  in  (A* 4)  will  be  useful  in 

5 

calculating  H  . 

The  main  remaining  calculation  is  to  evaluate 

A 

var  f  ( x)  , 

B 

into  which  we  shall  substitute  mean  values  for  nQ,  n1  and  n2* 

The  dominant  term  in  the  variance  over  ^  (which  is  independent  of  x) 
becomes 

n_1(1-e  )P(A)h“2  J  K2((x-y)/h)f(y)dy  •  P(A)"1 
U  A 

and  the  integral  of  this  over  x  is 

< 1“®0 )P(A)I2  n-1h-1  .  (A. 6) 
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■* 


The  remaining  contribution  to  (10)  is  the  variance,  over  x,  of 
n  1{n0QOh(x)  +  n(1-0o)Yh(x)},  that  is. 


Vh(x)  +  (1"e0)h_1  /_  K((x-*)/h)ffh(*)d*  -  (1-80)sh(A){hP(A)}"1  x 


r.i-1 


/_  K( (x-z)/h)f (z)dz  • 
A 


Given  nQ,  we  have  the  following. 

(i)  var  ah(x)  *  (n^)-1!^ (x)  +  oCn^h  *>  # 

so  /  var  ah(x)dx  -  (nQh)"1I2  +  ofn^h  1 )  • 

(ii)  var  {h  1  /_  K((x-z)/h)o  (z)dz} 


{h~2  J_  J_  K((x-z)/h)K((x-y)/h)0  (z)0  (y)dydz} 


A  A 


"  n^h  4{/  f(u)  /_  /_  K( (x-z)/h)K( (x-y)/h)K( (z-u)/h)K(y-u)/h)dydzdu} 


A  A 


Integrate  over  x  and  substitute 

x-z  y-u  z-u  .  . .  .  x-y 

*  WJ  m  v|  — —  *  t,  so  that  »  w  +  t  -  v 

n  n  n  n 


Thus 


/  var  {h  1  /_  K( (x-z)/h)oh(z)dz}dx 
A 

m  n^1  h"1  /_  f(y)dy  ///  K( v)K(w)K( t )K(w+t-v)dvdtdw 

-  (n6()h)"1P(A)I3  , 

where  is  the  triple  integral. 

(iii)  /  cov{oh(x),  h"1  /_  K( (x-z)/h)ah(z)dz}dx 

*  n  3  ///.  K( (x-z)/h)K( (x-u)/h)K( (z-u)/h)f (u)dzdxdu  . 

A 

Substitute  u  *  z-vh,  x  -  z+wh.  Thus,  the  right  hand  side  is 

n^h  1  //  K(w)K(v-Hir)K(v)dv  •  /_  f(z-vh)dz 
0  A 

*  P(A)  «  (nh8())“1P(A)  .  I4  , 
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where  I4  is  the  double  integral. 

(iv)  CO v{oh(x),  Sh(A)>  =  /_  cov(®h(x),  oh(y))dy 

A 

-  /_  {n^h-2  /  K((x-z)/h)K((y-z)/h)f (z)dz  +  oln^h'^Jdy  . 

A 

Substitute  (x-z)/h  ■  u,  (y-x)/h  *  v.  Then 

cov<o.  (x),  S.  (A))  -  Ofn”1)  =  0(n_1)  -  o(n_1h  1)  . 

n  n  u 

(v)  var(S.(A))  -  /_  /_  cov(oh(y)f  oh(z))dydz 

A  A 

»  n^h"2  /_  /_  If  K(  (y-x)/h)K(  (z-x)/h)f  (x)dx>dydz 
0  A  A 

■  Ofn”1)  -  o(n~^h”^ ) ,  as  in  (iv). 

(vi)  covOi”1  /_  K( (x-z)/h)Oh(z)dz,  S^tA)) 

A 

*  h”1  J_  /_  K( (x-z )/h)  cov(0h(z),  Oh(y))dzdy 

A  A 

*  n^h"3  /  {/_  /_  K(  (x-z)/h)K(  (z-u)/h)K(y-u)/h)f  (u)dzdy}du 

0  A  A 

»  OCn”1)  *  o(n“1h-1)#  also. 

Thus  the  dominant  term  in  the  integrated  variance  over  x  is  obtained  from  (i), 

(ii)  and  (iii).  We  obtain  9 

-1 

/  Bn  var  ffi(x)dx  -  (nh)  ^e0I2  +  ( - q -  I3  +  2( 1-8Q )I4)p(A) }  . 

-  x  0 

Combining  this  with  (A.6)  we  obtain 

/  var  f  (x)dx  «  G_n  *h  1  +  o(n  \  , 

B  B 

where 

gb  -  i2{eQ  +  (i-eo)P(A)}  +  (i-e0)p(A){(i-e0)i3/e0  +  2i4  }  .  <a.7> 

With  the  addition  of  (A.2),  we  have 

/  var  £  (x)dx  »  G  n  \  1  +  o(n  \  ^  , 

A  A 

where 

GA  "  GB  +  (1“e0)P(A)I2/r  . 
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(A.8) 
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